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1. INTRODUCTION 

Generally, fast direct solvers (FDS)* are not directly applicable to a non- 
separable elliptic partial differential equation (PDE')i This' limitation, however, 
can be circumvented by a semi-direct procedure, i.e., an iterative procedure using 
FDS algorithms. In this paper, we present. an efficient semi-direct procedure which 
is easy to implement and applicable to a variety of boundary conditions. The cur- 
rent procedure also possesses other highly desirable properties, i.e., (1) the con- 
vergence rate does not decrease with an increase of grid cell aspect ratio, and (2) 
the convergence rate can be estimated using the coefficients of the PDE being solved. 

Many elliptic PDE encountered in flow problems can be expressed as 

Lu = h (1.1) 

where L is a nonseparable second-order linear elliptic operator, u the dependent 
variable, and h a given source term. Equation (1.1) may be solved with the 
iterative procedure: 

v 2 (u n+1 - u n ) = — x (Lu n - h) (n = 0,1; 2,.’..) (1.2) 

Here t is a nonzero relaxation factor and v 2 the Laplacian which can be directly- 
inverted by a FDS. In the previous works 2-4 using Eq. (1.2), t is treated as. a 
constant and the iteration is accelerated by an optimal choice of t. In this study, 
we consider t as a spatially varying parameter. The local value of t is chosen to 
optimize the local convergence rate which is evaluated using a von Neumann analysis. 


STABILITY ANALYSIS 


As an initial step, it is assumed that t is a constant and 
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where a, b, and c are constants. Using a uniform grid with grid intervals ax and 
Ay, the central difference form of Eq. (1.2) at a grid point (i,j) is 
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Here hjj is the source term and the operators P and L, respectively, are 
defined by 

p(u L)' Ux ) r? ( u iM,j* u i-i,j- 2u t,j) , ( 4 >)" 2 , u ijnS,o-r 2u ",j ) ' ti'n 

“i-i, r Zu i.j)* ct * , ) -2tu ",j+if “u-i- 2u u’. * 

•. . J., ♦“1-1J-1 -“"-U»lV’ (2 ' 4) 

The convergence rate of Eq. (2.2) will be" analyzed assuming ; 

e 'J j 5 u i,j -u i,j = enex Pt I (ik x ax+jk y Ay)], I =vQ , (2.5) 

where Ui^j is the converged solution, ej^ the error amplitude and k = (kv>ky) 
the propagation vector. Assuming that IT ^ 0 and using the following definitions: 
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e x = (k x Ax)/2, a x = (sine x )/&x, e y = (k y Ay)/2, a y = (sine y )/Ay (2.6) 

S x S aJU'/H'/l 112 , S y 5 (2.7) 

then Eqs. (2.2) and (2.5) imply that 

e n+1 /e° = E(t.(c) = 1— rG(ic) (Jc * 0) (2.8) 

with G(fc) = a(s ) 2 + c(s ) 2 + 2bs s cose cose (2.9) 

a y a y a y 


It should be noted that the von Neumann analysis (along with the condition £ £ 0) can 
be fully justified if it is applied to a grid with periodic boundary conditions. 

Let om ax and o m j n , respectively, be the greatest and the smallest eigen- 
values of the symmetric and positive-definite matrix (u, v = 1, 2) 

A = (<* pv )» <*l]= a » «i2 =a 21 = k’ a 22 =c (2-10) 

Then it can be shown that^ 
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with the understanding that the bounds o max and o m i n are sharp if all the 
allowable T(^0) are considered. As_a result, the asymptotic error amplification 
factor E(t)(= the supremum of |e(t, IT) | for a given t) will reach its minimum 


E q 5 E(t q ) = (r - l)/(r + 1) < 1 (2.12) 


when t = t =the optimal relaxation factor = 2/(o • + a . ) = 2/ ( a + c) (2.13) 

o max min 

Here z is the condition number 2o m ax/°mjn 

Equations (2.1)— (2.13) can be easily generalized for N-dimensional problems. 

The only exception is Eq. (2.13) where the last equality sign is not valid if N>2. 

3. LOCAL RELAXATION 


The variable coefficient (VC) version of the numerical procedure presented in 
Section 2 is obtained by replacing the constant coefficients a, b, c, t, x 0 , o max , 
and o™^ n in Eqs. (2.2) to (2. A) and (2.13), respectively, with the grid point 
dependent coefficients a^, b^, c^, Tg, T 0>ij , o m ax,ij and o min>ij . Obviously, 
the VC version is well defined at ail interior grid points. For a grid point on a 
periodic or Neumann boundary, it can also be defined using the periodic condition or 
an extrapolation technique explained in Ref. 6. 

The current procedure can be modified to solve PDE with 


L = L ' (x,y) = 


3X 


(p(x,y) jy) + ly (q(x,y) ly) 


(3.1) 

L = L' (x,y) , 


where p and q are arbitrary positive functions of x _and y. With 
the VC version of Eq. (2.2) can be obtained by replacing "L ( u ° # j ) with^ 

0(u i,j) 5 ( lx >" 2[ P(i-l/2» “i-l.r p (i+l/2)j Vl..j- (p (i-l/2)j tp (i*l/2)j )u ?,j ] * 

(a y ) Z C < «i ( j-l/2) u ", ( j+l/2) u i , j+l _(q i ( j-l/2)* q 1 < j*l/2) >u ? , (3 ' 2) 
In case that the values of p and q do not vary greatly from one grid point to its 
neighbors, then L' (u? ^ )=L(uV ) assuming a^ = p^, c^. = q^ 
observation coupled with Eq. (2.13) leads us to the assumption: 


and b- , =0. 

* J 


This 


r . . = t ' . . r 2/ (p ■ + q . . ) 

ij o,i, J VK iJ M ij' 
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(3.3) 
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4. SCALING 


Assuming that the definition of the operator P 


is broadened as 


P(u n . ) = g (ax) -2 (u? - ,+u? - ,-2u? .)+g (ay) 2 (u? . . + u l ? . ,-2u? .) 

i , j * i + i,j t-I.j V i,j + i 1,3-1 i,j 


(4.1) 


with g x and gy positive constants, then the only modifications required for Secw 2 
are to replace the coefficients a x , a v , a, b, and c in Eqs. (2.7), (2.9), (2.10) 
and (2.13), respectively, with v 7gja x , -v/§y a y» a/ g x , and c/gy. Since 

g„ and gy are positive constants, their appearance in Eq. (4.1) does not increase 
the difficulty of inverting P. However, their introduction into the current itera- 
tive procedure does have an effect on the convergence rate. As a result, the current 
iterative procedure can be accelerated considerably by a proper ^choice of g x and 
g„. Obviously, this scaling technique can also be used on the solutioh of POE with 
variable coefficients as long as the scaling coefficients g x and gy remain 
positive constants. 


a . 


NUMERICAL EVALUATION 


To facilitate this discussion, we begin with the following preliminaries: 

(a) the residual norm r(n) and error norm e(n) after n iterations are defined by: 


\2-,l/2 


(n) = [£ (L(u" -)-h ) Z ] 1/Z and e(n) e [£ (“" r u T i> ] 
i , j ,J ,J i,j ’ J 

’ j is the machine accuracy solution. Moreover, we define 
0 p (n) = -1og 1Q [r(n)/r(0)] and 0 e (n) = -log 1Q [e(n)/e(0)] 


(5.1) 


whe>-e 


(b) We assume that u? . =^0 at all grid points where Ui j's.are unknowns. n 
As a result, e(o)t J l|u”ll arid 0 e (n) is the number of correct digits in u?. .. 

' t J 

(c) The coefficient E« y defined by the VC version of Eq. (2.12) may be inter- 
preted as an asympt6tic error amplification factor. As a result, the value of 
0 e (n) (and 0 r (n) if 0 r (n) — • 0 e ( n ) ) may be predicted by the parameter 

Ot(n) = -n logig (the supremum of all. ,E 0 y) (5.3) 

where E 0j y is evaluated assuming ay = py, by = 0 and cy = qy 

if L = L'(x,y). Alternatively, the value of 0 e (n) or 0 r (n) may be 
predicted by Ot(n) = the limit of Ot(n) as the grid cell size approaches zero. 

The domain for all numerical problems is assumed to be l>x>0 and l>y>0. Moreover, 
the iterative procedure is executed using aan FDE code 8 with g x = gy = 1 . 

The first group of POE to be studied in this section includes: 


(d) 


? ? ? ? ??? 7 

(l + 2x +2y ) 3 u/ax £+ ( 1+x^+yy a u/3y =1 
( l + 2x^+2y^) a^u/3x^+( l+x^+y Z ) 3^u/3X3y + ( l + x^ + y^) 3 Z u/3y^=l 


|j{{l+(x 2 +y Z ) f )--|j]^|yC{2+(x 2 +y Z ) t } Jy]=l, t=2,4, 6, 8 


(5.4) 

(5.5) 

(5.6) 


Ten numerical problems associated with Eqs. (5.4.)— (5.6) are defined in Table 1. The 
parameters MX and MY, respectively, are the number of grid intervals in the x and 
y directions. It is assumed that u = 0 at all boundaries. 

Problems 1-3 are associated with Eq. (5.4). They differ only in the grid cell 
size and aspect ratio. As shown in Table 1, for all three problems, the value of 
0 t (20) or 0 t (20) provides a conservative estimate of 0 r (20) (since 0 e (n) ~ 0-(n), 
the values of Op(n) are not listed). It is also seen that the very large aspect 
ratio (16:1) of the grid cell in problem 3 does not cause any substantial reduction 
in convergence rate. Problems 4-6 represent a parallel study for Eq. (5.5). It is 
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evident that the appearance of a cross derivative term in Eq. (5.5) causes_an 
increase in the conservatism of the theoretical estimate made by Ot(n) or Ot(n)i 
This is particularly true if the grid cell size or aspect ratio is large. For 
problems 1-6, the convergence histories are closely represented by straight lines (as 
an example, see Fig. 1) as would be expected from our theoretical study. 

Equations (5.6) belong to the class of POE defined by Eq. (3.1). It is evident 
that the variation of the coefficients p and q increases progressively as one 
goes from 4 = 2 to % = 4 and so on. For t = 8, the increase in the values 
of p and q from one corner (x=y=0) to another corner (x=y=l) on the unit square 
is of the order of 100 times. As shown in Fin. 2, the current method is still useful 
in this extreme case. The robustness of this algorithm is most evident in its 
ability to reverse the trend toward divergence during the first few iterations. 

The second group of POE to be studied includes 


■|j{{l + (x + y)^)^|j] + -|y£tl + sin^(x + y)>^-|^']=h 1 (x,y) (5.7) 

{ l+£( x 4 +y 4 ) > 2 !£| + !y£ { 1+Jf( x 4 +y 4 ) } 2 |y]= h 2 (x ,y ) (5.8) 


Here h^(x,y) and h^x.y) along with the boundary values of u are chosen such that 


u = «l(x,y) = sinx siny and u = U2(x,y)=[x(l-x)y(l-y)] z (5.9) 

respectively, are the exact solutions of Eq. (5.7) and (5.8). Both Eqs. (5.7) and 
(5.8) are solved numerically assuming MX=MY=16. Both numerical problems are test 
problems used by Bank 4 . Compared with the current value of 0^(10) = 3.47, the 
values obtained by Bank for Eq. (5.7) are 3.49 without using his scaling technique 
and 3.87, 4.81, and 6.76 using different scaling functions. Since the operator used 
by Bank is a general separable operator, the computation time required for its 
inversion is about six times that required for the current Poisson-like operator (p. 

5 of ref. 1). As a result, the current method is at least a factor of 2 more 
efficient even without using our scaling technique. For Eq. (5.8), compared with the 
current value of 0 e (10) = 8.59, the values reported by Bank are 5.88 without using 
scaling technique and 14.79 if a scaling function is used. Equation (5.8) was also 
solved by Concus and Golub^ and the results are comparable with ours. However 
their method is applicable only when p(x,y) = q(x,y) as in the case of Eq. (5.8). 


6. APPLICATION TO A 3-D INTERNAL FLOW PROBLEM 


In a numerical study (to be published) of 3-D incompressible shear flow in a 
turning channel (Fig. 3) , the el 1 ipt ic PDE 

3^u/3x^+a^u/3y^ + n(x,y)a^u/3Z^ = h, (2>x>-2, 0.75>y>0.65, 0.12^>0) (6.1) 

is solved in computational space. Here h is a known source term and 

n(x,y) = (cosh(wx) + cos(ny) )/(coshUx) - cosUy)) > 0 (6-2) 

It is assumed that u = 0 at x = 2 and the normal derivative of u vanishes at 
other boundaries. Since Eq. (6.1) has a form specified by Eq. (1.1) and the 3-D 
version of Eq. (2.1), it was solved numerically using the current iterative method 
with a. 144 x 12 x 12 uniform grid. To accelerate the convergence, the scaling 
coefficients of the 3-D operator P (see Eq. (4.1)) are given by 

9 X = 9 y = 1, 9 3 = Vn max • n max * 0.4135 (6.3) 

where r^,a X (=0.9966) and ryp^ n (=0.1716), respectively, are the maximum and the 
minjmum of n in the domain of Eq. (6.1). It should be noted that the maximum value 
of 0 1 ( 1 ) (= 0.415) is reached if the values of g x , gy, and g z are chosen 
according to Eq. (6.3). 
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Equation (6.1) was numerically solved many times with different source terms h 

Without exception, it was found that, after four iterations, the slope of 0 r (n) 

vs n curve is predicted by the value of 0^(1) to within ten percent. 
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